Predicting Protein Folds with Fold-Specific PSSM Libraries
نویسندگان
چکیده
Accurately assigning folds for divergent protein sequences is a major obstacle to structural studies. Herein, we outline an effective method for fold recognition using sets of PSSMs, each of which is constructed for different protein folds. Our analyses demonstrate that FSL (Fold-specific Position Specific Scoring Matrix Libraries) can predict/relate structures given only their amino acid sequences of highly divergent proteins. This ability to detect distant relationships is dependent on low-identity sequence alignments obtained from FSL. Results from our experiments demonstrate that FSL perform well in recognizing folds from the "twilight-zone" SABmark dataset. Further, this method is capable of accurate fold prediction in newly determined structures. We suggest that by building complete PSSM libraries for all unique folds within the Protein Database (PDB), FSL can be used to rapidly and reliably annotate a large subset of protein folds at proteomic level. The related programs and fold-specific PSSMs for our FSL are publicly available at: http://ccp.psu.edu/download/FSLv1.0/.
منابع مشابه
Towards Solving the Inverse Protein Folding Problem
Accurately assigning folds for divergent protein sequences is a major obstacle to structural studies and underlies the inverse protein folding problem. Herein, we outline our theories for fold-recognition in the “twilight-zone” of sequence similarity (<25% identity). Our analyses demonstrate that structural sequence profiles built using Position-Specific Scoring Matrices (PSSMs) significantly o...
متن کاملUsing Support Vector Machine and Evolutionary Profiles to Predict Antifreeze Protein Sequences
Antifreeze proteins (AFPs) are ice-binding proteins. Accurate identification of new AFPs is important in understanding ice-protein interactions and creating novel ice-binding domains in other proteins. In this paper, an accurate method, called AFP_PSSM, has been developed for predicting antifreeze proteins using a support vector machine (SVM) and position specific scoring matrix (PSSM) profiles...
متن کاملAn assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structu...
متن کاملQuery-seeded iterative sequence similarity searching improves selectivity 5–20-fold
Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has bee...
متن کاملA protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM
Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task i...
متن کامل